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Abstract 

We present a memory-based learning (MBL) approach 
to shallow parsing in which POS tagging, chunking, and 
identification of syntactic relations are formulated as 
memory-based modules. The experiments reported in 
this paper show competitive results, the Fp = \ for the 
Wall Street Journal (WSJ) treebank is: 93.8% for NP 
chunking, 94.7% for VP chunking, 77.1% for subject 
detection and 79.0% for object detection. 

Introduction 

Recently, there has been an increased interest in ap- 
proaches to automatically learning to recognize shallow 
linguistic patterns in text [Ramshaw and Marcus, 1995, 
Vilain and Day, 1996, Argamon et at, 1998, 

Buchholz, 1994 pardic and Pierce, 1998 , 



Veenstra, 1998, Daelemans et al., 1999a |. Shallow 

parsing is an important component of most text analy- 
sis systems in applications such as information extrac- 
tion and summary generation. It includes discovering 
the main constituents of sentences (NPs, VPs, PPs) and 
their heads, and determining syntactic relationships like 
subject, object, adjunct relations between verbs and 
heads of other constituents. 

Memory-Based Learning (MBL) shares with other 
statistical and learning techniques the advantages of 
avoiding the need for manual definition of patterns 
(common practice is to use hand-crafted regular expres- 
sions), and of being reusable for different corpora and 
sublanguages. The unique property of memory-based 
approaches which sets them apart from other learn- 
ing methods is the fact that they are lazy learners: 
they keep all training data available for extrapolation. 
All other statistical and machine learning methods are 
eager (or greedy) learners: They abstract knowledge 
structures or probability distributions from the train- 
ing data, forget the individual training instances, and 
extrapolate from the induced structures. Lazy learn- 
ing techniques have been shown to achieve higher ac- 
curacy than eager methods for many language pro- 



cessing tasks. A reason for this is the intricate in- 
teraction between regularities, subregularities and ex- 
ceptions in most language data, and the related prob- 
lem for learners of distinguishing noise from excep- 
tions. Eager learning techniques abstract from what 
they consider noise (hapaxes, low-frequency events, 
non-typical events) whereas lazy learning techniques 
keep all data available, including exceptions which 
may sometimes be productive. For a detailed analy- 
sis of this issue, see [Daelemans et at, 1999a]. More- 
over, the automatic feature weighting in the similar- 
ity metric of a memory-based learner makes the ap- 
proach well-suited for domains with large numbers of 
features from heterogeneous sources, as it embodies a 
smoothing-by-similarity meth od when data is sparse 



Zavrel and Daelemans, 1997] 



In this paper, we will provide a empirical evalua- 
tion of the MBL approach to syntactic analysis on a 
number of shallow pattern learning tasks: NP chunk- 
ing, VP chunking, and the assignment of subject-verb 
and object-verb relations. The approach is evalu- 
ated by cross-validation on the WSJ treebank corpus 
Marcus et al., 1993|. We compare the approach quali- 



tatively and as far as possible quantitatively with other 
approaches. 

Memory-Based Shallow Syntactic 
Analysis 

Memory-Based Learning (MBL) is a classification- 
based, supervised learning approach: a memory-based 
learning algorithm constructs a classifier for a task by 
storing a set of examples. Each example associates a 
feature vector (the problem description) with one of a 
finite number of classes (the solution). Given a new 
feature vector, the classifier extrapolates its class from 
those of the most similar feature vectors in memory. 
The metric defining similarity can be automatically 
adapted to the task at hand. 

In our approach to memory-based syntactic pat- 
tern recognition, we carve up the syntactic anal- 



ysis process into a number of such classification 
tasks with input vectors representing a focus item 
and a dynamically selected surrounding context. As 
in Natural Language Processing problems in general 



[Daelemans, 1995 1, these classification tasks can be seg- 
mentation tasks (e.g. decide whether a focus word or 
tag is the start or end of an NP) or disambiguation 
tasks (e.g. decide whether a chunk is the subject NP, 
the object NP or neither). Output of some memory- 
based modules (e.g. a tagger or a chunker) is used as 
input by other memory-based modules (e.g. syntactic 
relation assignment). 

Similar cascading ideas have been explored in other 
approach es to text analysis: e.g. finite s tate partial 
parsing | Abney, 1996 , Grefenstette, 1996 , statistical 



decision tree p arsing [Magcrman, 1994 1, maximum en- 
tropy pa rsing Ratnaparkhi, 1997 1, and memo ry-based 



learning | Cardie, 1994 JDaclcmans et al., 1996 



Algorithms and Implementation 

For our experiments we have used TiMBLR, an MBL 
software pack- 

age developed in our group | Daelemans et al., 1999b |. 
We used the following variants of MBL: 



• ibI-ig: The distance between a test item and each 
memory item is defined as the number of features for 
which they have a different value (overlap metric). 
Since in most cases not all features are equally rele- 
vant for solving the task, the algorithm uses informa- 
tion gain (an information-theoretic notion measuring 
the reduction of uncertainty about the class to be pre- 
dicted when knowing the value of a feature) to weight 
the cost of a feature value mismatch during compari- 
son. Then the class of the most similar training item 
is predicted to be the class of the test item. Clas- 
sification speed is linear to the number of training 
instances times the number of features. 

• IGTree: ibI-ig is expensive in basic memory and 
processing requirements. With IGTree, an oblivious 
decision tree is created with features as tests, and or- 
dered according to information gain of features, as a 
heuristic approximation of the computationally more 
expensive pure MBL variants. Classification speed 
is linear to the number of features times the average 
branching factor in the tree, which is less than or 
equal to the average number of values per feature. 

For more references and information about these al- 
gorithms we refer to 

In 



[Daelemans et al, 1999b, Daelemans et al., 1999a]. 
[Daelemans et al., 1996[| both algorithms are explained 



^iMBL is available from: http://ilk.kub.nl/ 



in detail in the context of MBT, a memory-based POS 
tagger, which we presuppose as an available module in 
this paper. In the remainder of this paper, we discuss 
results on the different tasks in section Experiments, 
and compare our approach to alternative learning meth- 
ods in section Discussion and Related Research. 

Experiments 

We carried out two series of experiments. In the first 
we evaluated a memory-based NP and VP chunker, in 
the second we used this chunker for memory-based sub- 
ject/object detection. 

To evaluate the performance of our trained memory- 
based classifiers, we will use four measures: ac- 
curacy (the percentage of correctly predicted out- 
put classes), precision (the percentage of predicted 
chunks or subject- or object- verb pairs that is cor- 
rect), recall (the percentage of chunks or subject- 
or object-verb pairs to be predicted that is found), 
and Fp JC.J.van Rijsbergcn, 1979[ , which is given by 

at >- pr f c - rec with 3 = 1. See below for an example. 

For the chunking tasks, we evaluated the algorithms 
by cross-validation on all 25 partitions of the WSJ tree- 
bank. Each partition in turn was selected as a test set, 
and the algorithms trained on the remaining partitions. 
Average precision and recall on the 25 partitions will 
be reported for both the ibI-ig and igtree variants of 
MBL. For the subject/object detection task, we used 
10-fold cross-validation on treebank partitions 00-09. 
In section Related Research we will further evaluate our 
chunkers and subject/object detectors. 

Chunking 

Following |Ramshaw and Marcus, 1995 ] we defined 
chunking as a tagging task, each word in a sentence 
is assigned a tag which indicates whether this word is 
inside or outside a chunk. We used as tagset: 

I_NP inside a baseNP. 

O outside a baseNP or a baseVP. 

B_NP inside a baseNP, but the preceding word is in 
another baseNP. 

I_VP and B_VP are used in a similar fashion. 

Since baseNPs and baseVPs are non-overlapping and 
non-recursive these five tags suffice to unambiguously 
chunk a sentence. For example, the sentence: 

[np Pierre Vinken np] , [np 61 years np] old , [yp 
will join vp] [np the board np] as [np a nonexecutive 
director np] [np Nov. 29 np] ■ 

should be tagged as: 



Methods 


context 


accuracy 


precision 


recall 


F 0=1 




NPs 


IGTree 


2-1 


97.5 


91.8 


93.1 


92.4 


IB1-IG 


2-1 


98.0 


93.7 


94.0 


93.8 


baseline words 





92.9 


76.2 


79.7 


77.9 


baseline POS 





94.7 


79.5 


82.4 


80.9 




VPs 


IGTree 


2-1 


99.0 


93.0 


94.2 


93.6 


IB1-IG 


2-1 


99.2 


94.0 


95.5 


94.7 


baseline words 





95.5 


67.5 


73.4 


70.3 


baseline POS 





97.3 


74.7 


87.7 


81.2 



Table 1: Overview of the NP/VP chunking scores of 25-fold cross-validation on the WSJ using IB1-IG with a context 
of two words and POS right and one left, and of using IGTree with the same context. The baseline scores are 
computed with IGTree using only the focus POS tag or the focus word 



Feature 


1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


Class 


Weight 


39 


40 


4 


3 


2 


10 


12 


18 


29 


18 


31 


13 


24 




Inst.l 
Inst. 2 
Inst. 3 


-1 
1 
2 












seen 
seen 
seen 


VBN 
VBN 

VNB 


sisters 
seen 


PRP$ 

VBN 


seen 

man 


VBN 

NN 


sisters 

man 

lately 


PRP$ 

NN 
RB 


seen 
lately 


VBN 
RB 


S 




Table 2: Some sample instances for the subject /object detection task. The second row shows the relative weight of 
the features (truncated and multiplied by 100; from one of the 10 cross-validation experiments). Thus the order of 
importance of the features is: 2, 1, 11, 9, 13, 10, 8, 12, 7, 6, 3, 4, 5. 



Pierre/_jvp Vinken/_jvp ,o 61i_np years/_jvp oldo 
,o willi_vp joini_yp the/_jvp board/_jvp aso sli_np 
nonexecutive/_jvp director/_jvp Nov.p_jvp 29i_jvp -o 



Suppose that our classifier erroneously tagged di- 
rector as B_NP instead of I-NP, but classified the 
rest correctly. Accuracy would then be y| = 0.94. 
The resulting chunks would be [np a nonexecutive np] 
[np director np] instead of [np a nonexecutive direc- 
tor np] (the other chunks being the same as above). 
Then out of the seven predicted chunks, five are correct 
(precision^ | = 71.4%) and from the six chunks that 



were to be found, five were indeed found (recall= | = 
83.3%). F p=1 is 76.9%. 

The features for the experiments are the word form 
and the POS tag (as provided by the WSJ treebank) of 
the two words to the left, the focus word, and one word 
to the right. For the results see Table [j] 

The baseline for these experiments is computed with 
IB1-IG, with as only feature: i) the focus word, and ii) 
the focus POS tag. 

The results of the chunking experiments show that 
accurate chunking is possible, with Fp = \ values around 
94%. 



Subject/Object Detection 

Finding a subject or object (or any other relation of a 
constituent to a verb) is defined in our classification- 
based approach as a mapping from a pair of words (the 
verb and the head of the constituent) and a represen- 
tation of its context to a class describing the type of 
relation (e.g. subject, object, or neither). A verb can 
have a subject or object relation to more than one word 
in case of NP coordination, and a word can be the sub- 
ject of more than one verb in case of VP coordination. 

Data Format 

In our representation, the tagged and chunked sentence 

[NP My /PRP$ szsters/NNS NP] [VP have /VBP 
not/RB seen/VBN VP] [NP the/DT old/33 
man/NN NP] lately /RB ./. 

will result in the instances in Table |2| 

Classes are S(ubject), O(bject) or "-" (for anything 
else). Features are: 

1 the distance from the verb to the head (a chunk just 
counts for one word; a negative distance means that 
the head is to the left of the verb), 





Together 
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Objects 


fj= relations 


51629 


32755 


18874 


Method 


ace. 


prec. 


rec. 


F/3=i 


prec. 


rec. 


Fp=! 


prec. 


rec. 


Fp=i 


Random baseline 
Heuristic baseline 




3.9 
65.9 


3.9 
66.5 


3.9 
66.2 


4.5 
69.3 


4.5 
61.6 


4.5 
65.2 


2.7 
61.6 


2.5 
75.1 


2.6 
67.7 


IGTree 
IBl-IG 


96.9 
96.6 


79.5 
74.4 


73.2 
76.9 


76.2 
75.6 


80.9 
76.2 


71.4 
76.9 


75.8 
76.5 


77.2 
71.5 


76.4 
76.7 


76.8 
74.0 


IGTree & IBl-IG unanimous 


97.4 


89.8 


68.6 


77.8 


89.7 


67.6 


77.1 


89.8 


70.4 


79.0 



Table 3: Results of the 10-fold cross validation experiment on the subject-verb/object-verb relations data. We 
trained one classifier to detect subjects as well as objects. Its performance can be found in the column Together. 
For expository reasons, we also mention how well this classifier performs when computing precision and recall for 
subjects and objects separately. 



2 the number of other baseVPs between the verb and 
the head (in the current setting, this can maximally 
be one), 

3 the number of commas between the verb and the 
head, 

4 the verb, and 

5 its POS tag, 

6—9 the two left context words/chunks of the head, rep- 
resented by the word and its POS 

10—11 the head itself, and 

12—13 its right context word/chunk. 

Features one to three are numeric features. This prop- 
erty can only be exploited by IBl-IG. IGTree treats 
them as symbolic. We also tried four additional fea- 
tures that indicate the sort of chunk (NP, VP or none) 
of the head and the three context elements respectively. 
These features did not improve performance, presum- 
ably because this information is mostly inferrable from 
the POS tag. 

To find subjects and objects in a test sentence, the 
sentence is first POS tagged (with the Memory-Based 
Tagger MBT) and chunked (see section Experiments: 
Chunking). Subsequently, all chunks are reduced to 
their heads .0 

Then an instance is constructed for every pair of a 
baseVP and another word/chunk head provided they 
are not too distant from each other in the sentence. A 
crucial point here is the definition of "not too distant" . 
If our definition is too strict, we might exclude too many 
actual subject- verb or object- verb pairs, which will re- 
sult in low recall. If the definition is too broad, we will 
get very large training and test sets. This slows down 

2 By definition, the head is the rightmost word of a 
baseNP or baseVP. 



learning and might even have a negative effect on pre- 
cision because the learner is confronted with too much 
"noise" . Note further that defining distance purely 
as the number of intervening words or chunks is not 
fully satisfactory as this does not take clause structure 
into account. As one clause normally contains one ba- 
seVP, we developped the idea of counting intervening 
baseVPs. Counts on the treebank showed that less than 
1% of the subjects and objects are separated from their 
verbs by more than one other baseVP. We therefore 
construct an instance for every pair of a baseVP and 
another word/chunk head if they have not more than 
one other baseVP in between them.[j 

These instances are classified by the memory-based 
learner. For the training material, the POS tags and 
chunks from the treebank are used directly. Also, 
subject-verb and object-verb relations are extracted to 
yield the class values. 

Results and discussion The results in Table H show 
that finding (unrestricted) subjects and objects is a 
hard task. The baseline of classifying instances at 
random (using only the probability distribution of the 
classes) is about 4%. Using the simple heuristic of clas- 
sifying each (pro)noun directly in front of resp. after the 
verb as S resp. O yields a much higher baseline of about 
66%. Obviously, these are the easy cases. IGTree, 



3 The following sentence shows a subject-verb pair (in 
bold) with one intervening baseVP (in italics): 

[np The plant np], [np which np] [vp is owned vp] by 
[np Hollingsworth & Vose Co. np] , [vp was vp] under 
[np contract np] with [np Lorillard np] [vp to make vp] 
[np the cigarette filters np] ■ 

The next example illustrates the same for an object-verb 
pair: 

Along [ N p the way N p] , [np he N p] [vp meets V p] [np a 
solicitous Christian chauffeur np] [np who np] [vp of- 
fers vp] [np the hero np] [np God np] [np 's phone num- 
ber np] ; and [np the Sheep Man np] , [np a sweet, rough- 
hewn figure np] [np who np] [vp wears vp] - [np what 
else np] - [np a sheepskin np] ■ 



Method 


Tagger 


accuracy 


precision 


recall 


Fj3=l 


A,D&K 


Brill 




91.6 


91.6 


91.6 


R&M 


Brill 


97.4 


92.3 


91.8 


92.0 


C&P 


Brill 


- 


90.7 


91.1 


90.9 


IB1-IG 


Brill 


97.2 


91.5 


91.3 


91.4 


IB1-IG 


MBT 


97.3 


91.6 


91.5 


91.6 


IB1-IG 


WSJ 


97.6 


92.2 


92.5 


92.3 


IBl-IG,POSonly 


WSJ 


96.9 


90.3 


90.1 


90.2 



Table 4: Comparison of MBL and MBSL on same dataset of several classifiers, the experiments with IB1-IG are all 
carried out with a context of five words and POS left and three right 



which is the better overall MBL algorithm on this task, 
scores 10% above this baseline, i.e. 76.2%. The differ- 
ence in accuracy between IGTree and IB1-IG is only 
0.3%. In terms of F-values, IB1-IG is better for find- 
ing subjects, whereas IGTree is better for objects. We 
also note that IGTree always yields a higher precision 
than recall, whereas IB1-IG does the opposite. 

IGTree is thus more "cautious" than IB1-IG. Pre- 
sumably, this is due to the word-valued features. Many 
test instances contain a word not occurring in the train- 
ing instances (in that feature). In that case, search in 
the IGTree is stopped and the default class for that 
node is used. As the "-" class is more than ten times 
more frequent than the other two classes, there is a 
high chance that this default is indeed the "-" class, 
which is always the "cautious" choice. IB1-IG, on the 
other hand, will not stop on encountering an unseen 
word, but will go on comparing the rest of the fea- 
tures, which might still opt for a non-"-" class. The 
differences in precision and recall surely are a topic for 
further research. So far, this observation led us to com- 
bine both algorithms by classifying an instance as S 
resp. O only if both algorithms agreed and as "-" oth- 
erwise. The combination yields higher precision at the 
cost of recall, but the overall effect is certainly positive 
(F/9=i = 77.8%). 

Discussion and Related Research 



In [ Argamon et al., 199q| , an alternative approach to 
memory-based learning of shallow patterns, memory- 
based sequence learning (MBSL), is proposed. In this 
approach, tasks such as base NP chunking and subject 
detection are formulated as separate bracketing tasks, 
with as input the POS tags of a sentence. For ev- 
ery input sentence, all possible bracketings in context 
(situated contexts) are hypothesised and the highest 
scoring ones are used for generating a bracketed out- 
put sentence. The score of a situated hypothesis de- 
pends on the scores of the tiles which are part of it 
and the degree to which they cover the hypothesis. A 



tile is defined as a substring of the situated hypoth- 
esis containing a bracket, and the score of a tile de- 
pends on the number of times it is found in the train- 
ing material divided by the total number of times the 
string of tags occurs (i.e. including occurrences with 
another or no bracket). The approach is memory- 
based because all training data is kept available. Sim- 
ilar algorithms have been proposed for grapheme-to- 



phon eme conver sion by [Dcdina and Nusbaum, 1991 



and [ fvon, 1996 1 , and the approach could be seen as a 
linear algorithmic simplification of the DOP memory- 
based approach for full parsing | Bod, 1995 |. In the re- 
mainder of this section, we show that an empirical com- 
parison of our computationally simpler MBL approach 
to MBSL on their data for NP chunking, subject, and 
object detection reveals comparable accuracies. 

Chunking 



For NP chunking, | Argamon et al., 1998] used data ex- 
tracted from section 15-18 of the WSJ as a fixed train 
set and section 20 as a fixed test set, the same data 
as [Ramshaw and Marcus, 1995|. To find the opti- 
mal setting of learning algorithms and feature con- 
struction we used 10-fold cross validation on section 
15; we found IB1-IG with a context of five words 
and POS-tags to the left and three to the right as 
a good parameter setting for the chunking task; we 
used this setting as the default setting for our ex- 
periments. For an overview of the results see Ta- 
ble Q. Since part of the chunking errors could be 
caused by POS errors, we also compared the same 
baseNP chunker on the same corpus tagged with i) the 



Brill tagger as used in | Ramshaw and Marcus, 1995 



ii) the Memory-Based Tagger (MBT) as described in 

We also present the results of 
[Ramshaw and Marcus, 199?: ] 



Daelemans et al, 1996|. We also present the results of 
Argamon et al, 1998|, 



and ]Cardie and Pierce, 1998fl in Table |4|. The latter 
two use a transformation-based error-driven learning 



method prill, 1992J . In jRamshaw and Marcus, 1995 
the method is used for NP chunking, and in 





Subjects 


Objects 


# subsequences 


3044 


1626 


Method 


prec. 


rec. 


F/3=i 


prec. 


rec. 


Fp=i 


A,D&K 


88.6 


84.5 


86.5 


77.1 


89.8 


83.0 


IGTree 
IBl-IG 


79.9 

84.7 


71.7 
81.6 


75.6 
83.1 


84.4 
87.3 


85.8 
85.8 


85.1 
86.5 


IBl-IG POS only 
IBl-IG without chunks 
IBl-IG with treebank chunks 


83.5 
29.2 
89.4 


77.9 
24.4 
88.6 


80.6 
26.6 
89.0 


76.1 
85.0 
91.9 


83.3 
18.5 
91.3 


79.6 
30.4 
91.6 



Table 5: Comparison of MBL and MBSL on subject/object detection as formulated by Argamon et al. 



[Cardie and Pierce, 199S] the approach is indirectly 
used to evaluate corpus-extracted NP chunking rules. 

As [Argamon et al., 1998| used only POS informa- 
tion for their MBSL chunker, we also experimented with 
that option (POSonly in the Table). Results show that 
adding words as information provides useful informa- 
tion for MBL (see Table 0). 



Subject/object detection 

For subject/object detection, we trained our algorithm 
on section 01-09 of the WSJ and tested on Argamon et 
al.'s test data (section 00). We also used the treebank 
POS tags instead of MBT. For comparability, we per- 
formed two separate learning experiments. The verb 
windows are defined as reaching only to the left (up to 
one intervening baseVP) in the subject experiment and 
only to the right (with no intervening baseVP) in the 
object experiment. The relational output of MBL is 
converted to the sequence format used by MBSL. The 
conversion program first selects one relation in case of 
coordinated or nested relations. For objects, the actual 
conversion is trivial: The V-0 sequence extends from 
the verb up to the head (seen the old man for the ex- 
ample sentence on page 0). In the case of subjects, the 
S-V sequence extends from the beginning of the baseNP 
of the head up to the first non-modal verb in the ba- 
seVP (My sisters have). The program also uses filters 
to model some restrictions of the patterns that Arga- 
mon et al. used for data extraction. They extracted e.g. 
only objects that immediately follow the verb. 

The results in Table |5| show that highly comparable 
results can be obtained with MBL on the (impover- 
ished) definition of the subject-object task. IBl-IG as 
well as IGTree are better than MBSL on the object 
data. They are however worse on the subject data. 
Two factors may have influenced this result. Firstly, 
more than 17% of the precision errors of IBl-IG con- 
cern cases in which the word proposed by the algorithm 
is indeed the subject according to the treebank, but the 
corresponding sequence is not included in Argamon et 
al.'s test data due to their restricted extraction pat- 



terns. Secondly, there are cases for which MBL cor- 
rectly found the head of the subject, but the conversion 
results in an incorrect sequence. These are sentences 
like "All [NP the man NP] [NP 's friends NP] came. " 
in which all is part of the subject while not being part 
of any baseNP. 

Apart from using a different algorithm, the MBL ex- 
periments also exploit more information in the train- 
ing data than MBSL does. Ignoring lexical information 
in chunking and subject/object detection decreased the 
F/3 = i value by 2.5% for subjects and 6.9% for objects. 
The bigger influence for objects may be due to verbs 
that take a predicative object instead of a direct one. 
Knowing the lexical form of the verb helps to make 
this distinction. In addition, time expressions like "(it 
rained) last week" can be distinguished from direct ob- 
jects on the basis of the head noun. Not chunking the 
text before trying to find subjects and objects decreases 
F-valuesby more than 50%. Using the "perfect" chunks 
of the treebank, on the other hand, increases F by 5.9% 
for subjects and 5.1% for objects. These figures show 
how crucial the chunking step is for the succes of our 
method. 

General 

Clear advantages of MBL are its efficiency (especially 
when using IGTree), the ease with which information 
apart from POS tags can be added to the input (e.g. 
word information, morphological information, wordnet 
tags, chunk information for subject and object detec- 
tion), and the fact that NP and VP chunking and dif- 
ferent types of relation tagging can be achieved in one 
classification pass. It is unclear how MBSL could be 
extended to incorporate other sources of information 
apart from POS tags, and what the effect would be 
on performance. More limitations of MBSL are that it 
cannot find nested sequences, which nevertheless occur 
frequently in tasks such as subject identificationQ and 
that it does not mark heads. 



4 e.g. [SV John, who [SV I like SV], is SV] angry. 



Conclusion 

We have developed and empirically tested a memory- 
based learning (MBL) approach to shallow parsing in 
which POS tagging, chunking, and identification of syn- 
tactic relations are formulated as memory-based mod- 
ules. A learning approach to shallow parsing allows 
for fast development of modules with high coverage, 
robustness, and adaptability to different sublanguages. 
The memory-based algorithms we used (IB1-IG and 
IGTree) are simple and efficient supervised learning 
algorithms. Our approach was evaluated on NP and 
VP chunking, and subject/object detection (using out- 
put from the chunker). Fp—x scores are 93.8% for NP 
chunking, 94.7% for VP chunking, 77.1% for subject 
detection and 79.0% for object detection. The accu- 
racy and efficiency of the approach are encouraging (no 
optimisation or post-processing of any kind was used 
yet), and comparable to or better than state-of-the-art 
alternative learning methods. 

We also extensively compared our approach to 
a recently proposed new memory-based learning al- 
gorithm, memory-based sequence learning (MBSL, 
[Argamon et al., 1998| and showed that MBL, which 



is a computationally simpler algorithm than MBSL, 
is able to reach similar precision and recall when re- 
stricted to the MBSL definition of the NP chunking, 
subject detection and object detection tasks. More im- 
portantly, MBL is more flexible in the definition of the 
shallow parsing tasks: it allows nested relations to be 
detected; it allows the addition and integration into 
the task of various additional sources of information 
apart from POS tags; it can segment a tagged sentence 
into different types of constituent chunks in one pass; it 
can scan a chunked sentence for different relation types 
in one pass (though separating subject-verb detection 
from object-verb detection is surely an option that must 
be investigated). 

In current research we are extending the approach 
to other types of constituent chunks and other types 
of syntactic relations. Combined with previous results 
on PP-attachment [Zavrel et al., 1997 , the results pre- 



sented here will be integrated into a complete shallow 
parser. 
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